Visualizing Bagged Decision Trees
نویسندگان
چکیده
We present a visual tablet for exploring the nature of a bagged decision tree (Breiman [1996]). Aggregating classifiers over bootstrap datasets (bagging) can result in greatly improved prediction accuracy. Bagging is motivated as a variance reduction technique, but it is considered a black box with respect to interpretation. Current research seekine: to explain why bagging works has focused ondifferent bias/variance decompositions of prediction error. We show that bagging’s complexity can be better understood by a simple graphical technique that allows visualizing the bagged decision boundary in low-dimensional situations. We then show that bagging can be heuristically motivated as a method to enhance local adaptivity of the boundary. Some simulated examples are presented to illustrate the technique. Decision trees are flexible classifiers with simple and interpretable structures (Ripley [1996]). The best known methods for constructing decision trees are CART (Breiman et al. [1984]) and C4.5 (Quinlan, [1993]). Consider a learning sample C consisting of a p-vector of input variables and a class label for each of n cases. Tree-structured classifiers recursively partition the input space into rectangular regions with different class assignments. The resulting partition can be represented as a simple decision tree. These models are, however! unstable to small perturbations in the learning samples that is, diffferent data can give very different looking trees. Breiman [1996a] introduced bagging (bootstrap aggregation) as a method to enhance the accuracy of unstable classification methods like decision trees. In bagging, B bootstrap (Efron and Tibshirani [1993]) datasets, are generated, each consisting of n cases drawn at random but with replacement from ,C. A decision tree is built for each of the B samples. The predicted class corresponding to a new input is obtained by a plurality vote among the B classifiers. Copyright Q 1997. American Association for Artificial Intelligence (www.aasi.org). All rights reserved. William J.E. Potts Professional Services Division SAS Institute inc, [email protected] Consequently, each new case must be run down each of the B decision trees and a running tally kept of the results. Bagging decision trees has been shown to lead to consistent improvements in prediction accuracy (Breiman [1996a,b], Quinlan [1996]). Bagging takes advantage of instability to improve the accuracy of the classification rule, but in the process destroys the simple interpretation of a single decision tree. Bagging stable classifiers can however actually increase prediction error (Breiman [1996a]). A flurry of current work to understand the theoretical nature of bagging has focused on different bias/variance decompositions of prediction error (Brieman [1996a,b], Friedman [1996], Tibshirani [1996], Kohavi and Wolpert [1996], James and Hastie[l997]). For simple risk functions like squared error loss, bagging can be shown to improve prediction accuracy through variance reduction. But due to the non-convexity of a 0 1 misclassification rate loss function, there is not a simple additive breakdown of prediction error into bias pius variance. What has been shown is that there is an interesting interaction between (boundary) bias (the decision rule produced relative to the gold standard Bayes rule) and variance of the classifier, and that depending on the magnitude and sign of the bias, bagging can help or do harm. Leaving aside the algebraic decompositions, bagging is generally regarded as a black box it’s inner workings cannot be easily visualized or interpreted. In this paper, we use a new graphical display called a classification aggregation tablet scan or CAT scan to visualize the bagging process for low dimensional problems. This is a general graphic that can be applied to any aggregated classifier. Here however, we focus on decision trees for the two-class discrimination problem with two-dimensional input vectors.
منابع مشابه
Using HMMs and bagged decision trees to leverage rich features of user and skill from an intelligent tutoring system dataset
This article describes the user modeling, feature extraction and bagged decision tree methods that were used to win 2 nd place student prize and 4 th place overall in the ACM’s 2010 KDD Cup.
متن کاملImproved Class Probability Estimates from Decision Tree Models
Decision tree models typically give good classification decisions but poor probability estimates. In many applications, it is important to have good probability estimates as well. This paper introduces a new algorithm, Bagged Lazy Option Trees (B-LOTs), for constructing decision trees and compares it to an alternative, Bagged Probability Estimation Trees (B-PETs). The quality of the class proba...
متن کاملAn Empirical Evaluation of Supervised Learning for ROC Area
We present an empirical comparison of the AUC performance of seven supervised learning methods: SVMs, neural nets, decision trees, k-nearest neighbor, bagged trees, boosted trees, and boosted stumps. Overall, boosted trees have the best average AUC performance, followed by bagged trees, neural nets and SVMs. We then present an ensemble selection method that yields even better AUC. Ensembles are...
متن کاملBagging Soft Decision Trees
The decision tree is one of the earliest predictive models in machine learning. In the soft decision tree, based on the hierarchical mixture of experts model, internal binary nodes take soft decisions and choose both children with probabilities given by a sigmoid gating function. Hence for an input, all the paths to all the leaves are traversed and all those leaves contribute to the final decis...
متن کاملBagging tree classifiers for laser scanning images: a data- and simulation-based strategy
Diagnosis based on medical image data is common in medical decision making and clinical routine. We discuss a strategy to derive a classifier with good performance on clinical image data and to justify the properties of the classifier by an adapted simulation model of image data. We focus on the problem of classifying eyes as normal or glaucomatous based on 62 routine explanatory variables deri...
متن کاملA comparison of stacking with meta decision trees to other combining methods
Meta decision trees (MDTs) are a method for combining multiple classifiers. We present an integration of the algorithm MLC4.5 for learning MDTs into the Weka data mining suite. We compare classifier ensembles combined with MDTs to bagged and boosted decision trees, and to classifier ensembles combined with other methods: voting, grading, multi-scheme and stacking with multi-response linear regr...
متن کامل